50

Algorithms for Binary Neural Networks

FIGURE 3.11

The curves of elements in M-Filter 1 (M

1), M-Filter 2 (M

2), M-Filter 3 (M

3), and M-Filter

4 (M

4) (in Fig. 3.2(a) and Eq. 3.12) on the CIFAR experiment in the training process. The

values of the nine elements in each M-Filter are learned similarly to their averages (dotted

lines). This validates that the special MCNs-1 with a single average element in each M

j

matrix is reasonable and compact without large performance loss.

reconstructing full-precision convolutional filters from binarized filters, limiting their use in

computationally limited environments. It has been theoretically and quantitatively demon-

strated that simplifying the convolution procedure via binarized kernels and approximating

the original unbinarized kernels is a very promising solution for DCNNs’ compression.

Although prior BNNs significantly reduce storage requirements, they also generally have

significant accuracy degradation compared to those using full-precision kernels and activa-

tions. This is mainly because CNN binarization could be solved by considering discrete

optimization in the backpropagation (BP) process. Discrete optimization methods can of-

ten guarantee the quality of the solutions they find and lead to much better performance in

practice [66, 119, 127]. Second, the loss caused by the binarization of CNNs has not been

well studied.

We propose a new discrete backpropagation via projection (DBPP) algorithm to effi-

ciently build our projection convolutional neural networks (PCNNs) [77] and obtain highly

accurate yet robust BNNs. Theoretically, we achieve a projection loss by taking advantage

of our DBPP algorithms’ ability to perform discrete optimization on model compression.

The advantages of the projection loss also lie in that it can be jointly learned with the

conventional cross-entropy loss in the same pipeline as backpropagation. The two losses

are simultaneously optimized in continuous and discrete spaces, optimally combined by the

projection approach in a theoretical framework. They can enrich the diversity and thus

improve modeling capacity. As shown in Fig.3.12, we develop a generic projection convolu-

tion layer that can be used in existing convolutional networks. Both the quantized kernels

and the projection are jointly optimized in an end-to-end manner. Our project matrices are

optimized but not for reference, resulting in a compact and efficient learning architecture.

As a general framework, other loss functions (e.g., center loss) can also be used to further

improve the performance of our PCNNs based on a progressive optimization method.

Discrete optimization is one of the hot topics in mathematics and is widely used to solve

computer vision problems [119, 127]. Conventionally, the discrete optimization problem is

solved by searching for an optimal set of discrete values concerning minimizing a loss func-

tion. This chapter proposes a new discrete backpropagation algorithm that uses a projection

function to binarize or quantize the input variables in a unified framework. Due to the flex-